In the following notebook, we are going to simply use an envisionbox python package. This package is called "envisionhgdetector" and contains functions to automatically annotate gesture. In some other envisionbox module on training a gesture classifier we exhibited an end-to-end pipeline for training a model on particular human behaviors, e.g., head nodding, clapping; and then producing some inferences on new videos. This package builds further on that work. Namely, we have trained convolutional neural network to differientate no gestures (including self-adaptors), and a gesture. We do this based on the SAGA dataset, the Zhubo dataset, and the TED M3D dataset. Given that we have trained it on a bit of variability in terms of datasets and angles, and more than 9000 gestures, we can use this gesture detector to a little bit more varied settings than we could do would we have trained on a single dataset.
Now, don't get too excited! The performance is not extraordinary or anything, and it still awaits proper testing and further updating with better trained models (we are working on it...). Currently not differientating types of gestures (as far as that is possible; we are working on it...). But it is good enough for some purposes to have a quick pass over on a set of videos and get some prominent gestures out. Once we have the gestures, we can do all kinds of other interesting things, e.g., generate gesture kinematic statistics, or generate gesture networks. But now all automatically!
https://pypi.org/project/envisionhgdetector/
It is best to install in a conda environment.
conda create -n envision python = 3.9
conda activate envision
Then proceed:
pip install -r requirements.txt
If you use this package, please cite:
Original Noddingpigeon Training code:
Zhubo dataset (used for training):
SAGA dataset (used for training)
TED M3D:
MediaPipe:
For this tutorial, I have two videos that I would like to segment for hand gestures. They all live in the folder: './videos_to_label/'
import os
import glob as glob
videofoldertoday = './videos_to_label/'
outputfolder = './output/'
import glob
from IPython.display import Video
# List all videos in the folder
videos = glob.glob(videofoldertoday + '*.mp4')
# Display single video
Video(videos[0], embed=True, width=200)
Video(videos[1], embed=True, width=200)
From the pypi package info we see that we can simply use this to get started:
from envisionhgdetector import GestureDetector
# Initialize detector
detector = GestureDetector(
motion_threshold=0.8, # Sensitivity to motion
gesture_threshold=0.8, # Confidence threshold for gestures
min_gap_s=0.3, # Minimum gap between gestures
min_length_s=0.3 # Minimum gesture duration
)
# Process videos
results = detector.process_folder(
video_folder="path/to/videos",
output_folder="path/to/output"
)
The gesture annotations can be finetuned with the settings you have:
from envisionhgdetector import GestureDetector
import os
# absolute path
videofoldertoday = os.path.abspath('./videos_to_label/')
outputfolder = os.path.abspath('./output/')
# create a detector object
detector = GestureDetector(motion_threshold=0.9, gesture_threshold=0.9, min_gap_s =0.2, min_length_s=0.5)
# just do the detection on the folder
detector.process_folder(
input_folder=videofoldertoday,
output_folder=outputfolder,
)
WARNING:tensorflow:From c:\Users\u668173\Anaconda3\envs\envision\lib\site-packages\keras\src\backend\tensorflow\core.py:222: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead. Successfully loaded weights from c:\Users\u668173\Anaconda3\envs\envision\lib\site-packages\envisionhgdetector\model\SAGAplus_gesturenogesture_trained_binaryCNNmodel_weightsv1.h5 Processing videoplayback (2).mp4... Generating labeled video... Generating elan file... Done processing videoplayback (2).mp4, go look in the output folder Processing videoplayback (2)_2_1.mp4... Generating labeled video... Generating elan file... Done processing videoplayback (2)_2_1.mp4, go look in the output folder
{'videoplayback (2).mp4': {'stats': {'average_motion': 0.8553642712561695,
'average_gesture': 0.966857220036815,
'average_move': 0.03314277811700271},
'output_path': 'c:\\Users\\u668173\\Desktop\\wimpouwenvisionboxwp\\envisionBOX_modulesWP\\UsingEnvisionHGdetector_package\\output\\videoplayback (2).mp4.eaf'},
'videoplayback (2)_2_1.mp4': {'stats': {'average_motion': 0.7676855239374508,
'average_gesture': 0.9752756533218406,
'average_move': 0.024724344355299285},
'output_path': 'c:\\Users\\u668173\\Desktop\\wimpouwenvisionboxwp\\envisionBOX_modulesWP\\UsingEnvisionHGdetector_package\\output\\videoplayback (2)_2_1.mp4.eaf'}}
import pandas as pd
import os
# lets list the output
outputfiles = glob.glob(outputfolder + '/*')
for file in outputfiles:
print(os.path.basename(file))
# load one of the predictions
csvfilessegments = glob.glob(outputfolder + '/*segments.csv')
df = pd.read_csv(csvfilessegments[0])
df.head()
labeled_videoplayback (2).mp4 labeled_videoplayback (2)_2_1.mp4 videoplayback (2).mp4.eaf videoplayback (2).mp4_predictions.csv videoplayback (2).mp4_segments.csv videoplayback (2)_2_1.mp4.eaf videoplayback (2)_2_1.mp4_predictions.csv videoplayback (2)_2_1.mp4_segments.csv
| start_time | end_time | labelid | label | duration | |
|---|---|---|---|---|---|
| 0 | 0.000000 | 0.689655 | 1 | Gesture | 0.689655 |
| 1 | 1.413793 | 3.896552 | 2 | Gesture | 2.482759 |
| 2 | 5.000000 | 5.724138 | 3 | Gesture | 0.724138 |
| 3 | 6.931034 | 7.517241 | 4 | Gesture | 0.586207 |
| 4 | 7.551724 | 8.379310 | 5 | Gesture | 0.827586 |
from moviepy import VideoFileClip
videoslabeled = glob.glob(outputfolder + '/*.mp4')
# need to rerender
clip = VideoFileClip(videoslabeled[1])
clip.write_videofile("./temp/example_2_labeled.mp4")
Video("./temp/example_2_labeled.mp4", embed=True)
MoviePy - Building video ./temp/example_2_labeled.mp4. MoviePy - Writing video ./temp/example_2_labeled.mp4
MoviePy - Done ! MoviePy - video ready ./temp/example_2_labeled.mp4
# need to rerender
clip = VideoFileClip(videoslabeled[0])
clip.write_videofile("./temp/example_1_labeled.mp4")
Video("./temp/example_1_labeled.mp4", embed=True)
MoviePy - Building video ./temp/example_1_labeled.mp4. MoviePy - Writing video ./temp/example_1_labeled.mp4
MoviePy - Done ! MoviePy - video ready ./temp/example_1_labeled.mp4
It is important to test the accuracy of your classifier against some hand-labeled data that was not used to train your model on. Indeed, you would report a confusion matrix (e.g., false positive rate, hits, etc.) or you the machine-human interrater reliability. In the future I would like to add such code in this module, as well train a general model for detecting general gestures. Do you have data suitable for this and you would like to use it, you can contact me (wim.pouw@donders.ru.nl). In general it would be great to know if this module is valuable for your behaviors, and knowing the boundary conditions of this pipeline.